Analysis of Breast cancer samples

Introduction

  • The data set is based on breast cancer sample.
  • Split into 10 parameters for:
    • mean
    • SE
    • worst
  • The state of the patients cancer is stated; benign (B, 1) or malignant (M, 0).

Materials and Method

  • Data was split into 3 files (joining)

  • All non relevant symbols in colnames were removed

  • 2 files are created: clean and clean_binary

Workflow for project

Boxplot - outliers

Correlation in parameters

  • Correlation matrix - on numeric clean_data
  • Matrix - tibble()
  • Re-arranged to plot correlation using heatmap

Augment of the data

Linear regression of each variable

  • group variables to nest

  • mutate formula by mapping

  • mapping tidy function

  • unnest model to extract statistics

  • add significance column based on q value

Analysis 1 - PCA unsupervised

  • PCA performed on numeric, scaled data

  • PC1 + PC2 plotted and colored based on diagnosis

  • Nice clustering of the two diagnosis

  • Plot showing variance explained by each PC

  • more than 40% explained by PC1

Analysis 2 - Random Forest Classfier

Workflow of supervised classifier

Analysis 2 - ROC-Curve

  • Good performance (Hugging top left corner)
  • All 5 fold consistently position
  • For diverse patient cases -> model exhibit reliability predictions

Discussion

  • Malignant cases:
    1. Greater heterogeneity observed
    2. Diverse genetic and molecular profiles likely contribute
    3. reflects the complexity
  • Benign cases:
    1. more similarity in attributes
    2. more consistent set of features
  • General trends:
    1. malignant tumors generally have higher values for most measurements
    2. highlights distinct patterns between malignant and benign cases
  • Significant correlations:
    1. Positive correlation with radius, area, perimeter
    2. Radius has slight correlation with concavity and compactness
    3. Size-related parameters interconnected in tumor morphology
  • Limited Correlation:
    1. Texture, symmetry, smoothness and fractal dimension show minimal correlation, suggest potential independence